1 Synchronization and Optimality for Multi - Armed Bandit Problems in Continuous Time
نویسندگان
چکیده
We provide a complete solution to a general, continuous-time dynamic allocation (multi-armed bandit) problem with arms that are not necessarily independent or Markovian, using notions and results from time-changes, optimal stopping, and multi-parameter martingale theory. The independence assumption is replaced by the condition (F.4) of Cairoli & Walsh. We also introduce a synchronization identity for allocation strategies, ∗ Rsearch supported by the U.S. Army Research Office under Grant DAAH 04-95-I0528. We are grateful to Prof. Daniel Ocone for finding, and correcting, an error in our original proof of Theorem 7.1. 2 which is necessary and sufficient for optimality in the case of decreasing rewards, and which leads to the explicit construction of a strategy with all the important properties: optimality in the dynamic allocation problem, optimality in a dual (minimization) problem, and the “index-type” property of Gittins.
منابع مشابه
On Optimality of Greedy Policy for a Class of Standard Reward Function of Restless Multi-armed Bandit Problem
In this paper,we consider the restless bandit problem, which is one of the most well-studied generalizations of the celebrated stochastic multi-armed bandit problem in decision theory. However, it is known be PSPACE-Hard to approximate to any non-trivial factor. Thus the optimality is very difficult to obtain due to its high complexity. A natural method is to obtain the greedy policy considerin...
متن کاملOn the optimality of the Gittins index rule for multi-armed bandits with multiple plays
We investigate the general multi-armed bandit problem with multiple servers. We determine a condition on the reward processes su1⁄2cient to guarantee the optimality of the strategy that operates at each instant of time the projects with the highest Gittins indices. We call this strategy the Gittins index rule for multi-armed bandits with multiple plays, or brie ̄y the Gittins index rule. We show...
متن کاملOn Optimality of Myopic Policy for Restless Multi-armed Bandit Problem with Non i.i.d. Arms and Imperfect Detection
We consider the channel access problem in a multi-channel opportunistic communication system with imperfect channel sensing, where the state of each channel evolves as a non independent and identically distributed Markov process. This problem can be cast into a restless multi-armed bandit (RMAB) problem that is intractable for its exponential computation complexity. A natural alternative is to ...
متن کاملBudgeted Bandit Problems with Continuous Random Costs
We study the budgeted bandit problem, where each arm is associated with both a reward and a cost. In a budgeted bandit problem, the objective is to design an arm pulling algorithm in order to maximize the total reward before the budget runs out. In this work, we study both multi-armed bandits and linear bandits, and focus on the setting with continuous random costs. We propose an upper confiden...
متن کاملAnalysis of Thompson Sampling for the Multi-armed Bandit Problem
The multi-armed bandit problem is a popular model for studying exploration/exploitation trade-off in sequential decision problems. Many algorithms are now available for this well-studied problem. One of the earliest algorithms, given by W. R. Thompson, dates back to 1933. This algorithm, referred to as Thompson Sampling, is a natural Bayesian algorithm. The basic idea is to choose an arm to pla...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996